Deep reinforcement learning algorithms are difficult to learn optimal policy through interaction with environment in reward sparsity environments, so that the intrinsic reward needs to be built to guide the update of algorithms. However, there are still some problems in this way: 1) statistical inaccuracy of state classification will misjudge reward value, thereby causing the agent to learn wrong behavior; 2) due to the strong ability of the prediction network to identify state information, the state freshness generated by the intrinsic reward decreases, which affects the learning effect of the optimal policy; 3) due to the random state transition, the information of the teacher strategies is not effectively utilized, which reduces the agent’s ability to explore the environment. To solve the above problems, a reward construction mechanism combining prediction error of stochastic generative network with hash discretization statistics, namely RGNP-HCE (Randomly Generated Network Prediction and Hash Count Exploration), was proposed, and the knowledge of multi-teacher policy was transferred to student policy through distillation. In RGNP-HCE mechanism, the fusion reward was constructed through the idea of curiosity classification. In specific, the global curiosity reward was constructed by stochastic generative network’s prediction error between multiple episodes, and the local curiosity reward was constructed by hash discretization statistics in one episode, which guaranteed the rationality of intrinsic rewards and the correctness of policy gradient updates. In addition, multi-teacher policy distillation provides students with multiple reference directions for exploration, which improved environmental exploration ability of the student policy effectively. Finally, in the test environments of Montezuma’s Revenge and Breakout, experiment of comparing the proposed mechanism with four current mainstream deep reinforcement learning algorithms was carried out, and policy distillation was performed. The results show that compared with the average performance of current high-performance deep reinforcement learning algorithms, the average performance of RGNP-HCE mechanism in both test environments is improved, and the distilled student policy is further improved in average performance, indicating that RGNP-HCE mechanism and policy distillation are effective in improving the exploration ability of agent.